AITopics | serverless inference

Collaborating Authors

serverless inference

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ServerlessLoRA: Minimizing Latency and Cost in Serverless Inference for LoRA-Based LLMs

Sui, Yifan, Wang, Hao, Yu, Hanfei, Hu, Yitao, Li, Jianxun, Wang, Hao

arXiv.org Artificial IntelligenceMay-21-2025

Serverless computing has grown rapidly for serving Large Language Model (LLM) inference due to its pay-as-you-go pricing, fine-grained GPU usage, and rapid scaling. However, our analysis reveals that current serverless can effectively serve general LLM but fail with Low-Rank Adaptation (LoRA) inference due to three key limitations: 1) massive parameter redundancy among functions where 99% of weights are unnecessarily duplicated, 2) costly artifact loading latency beyond LLM loading, and 3) magnified resource contention when serving multiple LoRA LLMs. These inefficiencies lead to massive GPU wastage, increased Time-To-First-Token (TTFT), and high monetary costs. We propose ServerlessLoRA, a novel serverless inference system designed for faster and cheaper LoRA LLM serving. ServerlessLoRA enables secure backbone LLM sharing across isolated LoRA functions to reduce redundancy. We design a pre-loading method that pre-loads comprehensive LoRA artifacts to minimize cold-start latency. Furthermore, ServerlessLoRA employs contention aware batching and offloading to mitigate GPU resource conflicts during bursty workloads. Experiment on industrial workloads demonstrates that ServerlessLoRA reduces TTFT by up to 86% and cuts monetary costs by up to 89% compared to state-of-the-art LLM inference solutions.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2505.14468

Country: Asia > China (0.28)

Genre: Research Report (1.00)

Industry: Information Technology > Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Reduce the time taken to deploy your models to Amazon SageMaker for testing

#artificialintelligenceSep-27-2022, 17:04:27 GMT

Data scientists often train their models locally and look for a proper hosting service to deploy their models. Unfortunately, there's no one set mechanism or guide to deploying pre-trained models to the cloud. In this post, we look at deploying trained models to Amazon SageMaker hosting to reduce your deployment time. SageMaker is a fully managed machine learning (ML) service. With SageMaker, you can quickly build and train ML models and directly deploy them into a production-ready hosted environment.

endpoint, inference script, sagemaker, (15 more...)

#artificialintelligence

Industry: Retail > Online (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

SageMaker Serverless Inference illustrates Amazon's philosophy for ML workloads

#artificialintelligenceApr-21-2022, 18:15:15 GMT

We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 - 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Amazon just unveiled Serverless Inference, a new option for SageMaker, its fully managed machine learning (ML) service. The goal for Amazon SageMaker Serverless Inference is to serve use cases with intermittent or infrequent traffic patterns, lowering total cost of ownership (TCO) and making the service easier to use. VentureBeat connected with Bratin Saha, AWS VP of Machine Learning, to discuss where Amazon SageMaker Serverless fits into the big picture of Amazon's machine learning offering and how it affects ease of use and TCO, as well as Amazon's philosophy and process in developing its machine learning portfolio. Inference is the productive phase of ML-powered applications.

inference, sagemaker serverless inference, serverless inference, (15 more...)

#artificialintelligence

Country: Asia > China (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Automating machine learning lifecycle with AWS

#artificialintelligenceMar-23-2022, 22:25:56 GMT

Machine Learning and data science life cycle involved several phases. Each phase requires complex tasks executed by different teams, as explained by Microsoft in this article. To solve the complexity of these tasks, cloud providers like Amazon, Microsoft, and Google services automate these tasks that speed up end to end the machine learning lifecycle. This article explains Amazon Web Services (AWS) cloud services used in different tasks in a machine learning life cycle. To better understand each service, I will write a brief description, a use case, and a link to the documentation. In this article, machine learning lifecycle can be replaced with data science lifecycle.

amazon emr, lifecycle, machine learning, (12 more...)

#artificialintelligence

Genre: Research Report (0.35)

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.31)

Add feedback

Deploying ML models using SageMaker Serverless Inference (Preview)

#artificialintelligenceJan-11-2022, 19:35:31 GMT

Amazon SageMaker Serverless Inference (Preview) was recently announced at re:Invent 2021 as a new model hosting feature that lets customers serve model predictions without having to explicitly provision compute instances or configure scaling policies to handle traffic variations. Serverless Inference is a new deployment capability that complements SageMaker's existing options for deployment that include: SageMaker Real-Time Inference for workloads with low latency requirements in the order of milliseconds, SageMaker Batch Transform to run predictions on batches of data, and SageMaker Asynchronous Inference for inferences with large payload sizes or requiring long processing times. Serverless Inference means that you don't need to configure and manage the underlying infrastructure hosting your models. When you host your model on a Serverless Inference endpoint, simply select the memory and max concurrent invocations. Then, SageMaker will automatically provision, scale, and terminate compute capacity based on the inference request volume.

endpoint, inference, serverless inference, (11 more...)

#artificialintelligence

Industry: Retail > Online (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

AWS Launches SageMaker Studio Lab, Free Tool to Learn and Experiment with Machine Learning

#artificialintelligenceDec-16-2021, 21:33:35 GMT

AWS has introduced SageMaker Studio Lab, a free service to help developers learn machine-learning techniques and experiment with the technology. SageMaker Studio Lab provides users with all of the basics to get started, including a JupyterLab IDE, model training on CPUs and GPUs and 15 GB of persistent storage. SageMaker Studio Lab has all the basics to create data analytics, scientific computing, and machine-learning projects with notebooks, which can be easily imported and exported via the Git repo or a private Amazon S3 bucket. SageMaker Studio Lab becomes an alternative to the popular Google Colab environment, providing free CPU/GPU access. Another enhancement for AWS SageMaker is a visual, no-code tool called SageMaker Canvas.

aw launch sagemaker studio lab, learn and experiment, sagemaker studio lab, (8 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Top 12 AI and machine learning announcements at AWS re:Invent 2021

#artificialintelligenceDec-5-2021, 02:40:30 GMT

This week during its re:Invent 2021 conference in Las Vegas, Amazon announced a slew of new AI and machine learning products and updates across its Amazon Web Services (AWS) portfolio. Touching on DevOps, big data, and analytics, among the highlights were a call summarization feature for Amazon Lex and a capability in CodeGuru that helps detect secrets in source code. Amazon's continued embrace of AI comes as enterprises express a willingness to pilot automation technologies in transitioning their businesses online. Fifty-two percent of companies accelerated their AI adoption plans because of the COVID pandemic, according to a PricewaterhouseCoopers study. Meanwhile, Harris Poll found that 55% of companies accelerated their AI strategy in 2020 and 67% expect to further accelerate their strategy in 2021.

ai and machine, amazon, invent 2021, (14 more...)

#artificialintelligence

Country: North America > United States > Nevada > Clark County > Las Vegas (0.25)

Industry: Health & Medicine > Therapeutic Area (0.57)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback